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We give bounds on the average fidelity achievable by any quantum state estimator, which is ar¬ 
guably the most prominently used figure of merit in quantum state tomography. Moreover, these 
bounds can be computed online—that is, while the experiment is running. We show numerically 
that these bounds are quite tight for relevant distributions of density matrices. We also show that 
the Bayesian mean estimator is ideal in the sense of performing close to the bound without requiring 
optimization. Our results hold for all finite dimensional quantum systems. 
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I. INTRODUCTION 

Inferring a quantum mechanical description of a physical system is equivalent to assigning it a quantum state—a 
process referred to as tomography. Tomography is now a routine task for designing, testing and tuning qubits in the 
quest of building quantum information processing devices [1]. In determining how "good" one is performing this 
task, a figure of merit must be reported. By far the most commonly used figure of merit for quantum states is fidelity 
[2, 3]. Nowadays, fidelity is used to compare quantum states and processes in a wide variety of tasks, from quantum 
chaos to quantum control to the continuous monitoring of quantum systems [4—10]. One might find it surprising, 
then, that the technique which optimizes performance with respect to fidelity is not known. 

For d-dimensional state space. 


S := {<r C L (c d ) : cr > 0, Tr(cr) = l} , 
the fidelity between two states p, a £ S is defined to be [2, 3], 


( 1 ) 


Tr \/ Vp^Vp 


F(p,a) := \\y/py/(r\\l = 


( 2 ) 
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Define the average fidelity with respect to some measure dp as TE p [F(p, c)] 1 . We want the average of this to be as large 
as possible. Thus, the problem can be succinctly stated as follows: 

maximize E f) [F(p, <r)] 

subject to Tr(cr) = 1, (3) 

a > 0 . 

In the context of tomography, we think of p as the "true state" and a as the estimated state. An estimator is a 
function from the space of data to quantum states <7 : data i —> cr(data) E S, where data are the results of a sequence 
of quantum measurements. Since both the true state and data are unknown, we take the expected value with respect 
to the joint distribution of (p, data) to obtain the average fidelity: 

f(cr) — E |0/dat a[F(p,tt(data))]. (4) 

We want this to be as large as possible. The estimator which maximizes this quantity is equivalent to the estimator 
maximizing the following posterior average fidelity for every data set: 

/ (cr|data) = E p | data [F(p, cr(data))]. (5) 

An estimator which maximizes this is called a Bayes estimator 2 . Bayes estimators are useful both to understand 
Bayesian optimality and to provide upper bounds for the worst case performance. 

Now here is the subtle and important point: the measurements performed, the data themselves and the distribu¬ 
tion from which they were generated are not important once the posterior distribution has been calculated. If we 
know the solution for every measure dp, then we know the solution for the posterior measure dp|data. For brevity, 
then, we will drop this conditional information from now on and the problem reduces again to (3). 


II. SUMMARY OF RESULTS 

In this work, we provide absolute benchmarks for the average fidelity performance of any tomographic estimation 
strategy by way of upper and lower bounds. This is important because, in the field of quantum tomography, a 
common theme is to compare estimators. Up to date many options are available: linear inversion [1], maximum 
likelihood [12], Bayesian mean [13], hedged maximum likelihood [14], and compressed sensing [15,16]— to name a 
few. Often estimators are compared by simulating measurements on ensembles of states drawn according to some 
measure and averaging the fidelity. This can only provide conclusions about the relative performance of estimators. 
Thus, our bounds can be used to benchmark the fidelity performance of other candidate estimators. 

We complement our theoretical findings with numerical experiments. These demonstrate the relative tightness 
of our bounds and, in particular, reveal that the Bayesian mean estimator is an excellent choice—owing to its near- 
optimal performance and ease of implementation. Importantly, both the mean of the distribution and our bounds 
can be computed online —that is, the estimator and its performance can be computed while data is being taken. In 
the context of Bayesian quantum information theory [13], our findings lend credence to the standard approach of 
using the mean of the posterior distribution as an estimator is a near-optimal one. 

We note that this problem has been solved for the case of a single qubit (d — 2). Bagan et al [6] have given the 
optimal estimator (and measurement!) for any isotropic prior measure. Unfortunately, by making heavy use of the 
Bloch representation of a qubit, the methods do not generalize. Whereas, our bound holds for all distributions of 
states in any dimension and coincides with the results of [6] for the case of a single qubit. 


A. Ensembles of pure states 

We first present the analytically soluble case of measures supported only on pure states. Such a case is common 
in theoretical studies which average the performance of their protocols over the popular choice of the unique Haar 
invariant measure on pure states. The solution is organized into the following theorem: 


1 Expectation values will always be denote with a subscript which specifies the implicit distribution of variables being averaged over. 

2 The terminology and objective functions used here can be seen as standard generalizations of those familiar in decision theory. See, e.g., [11]. 
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Theorem 1. Choose an arbitrary dimension d and assume that the integration measure dp is supported only on pure states. 
Then , the state which solves the optimization problem (3) is the eigenvector ofE p [p] with maximal eigenvalue. It achieves a 
maximal fidelity of ||E p [p] 

The proof is a simple exercise in linear programming. When p is a pure state, the fidelity simplifies to F(p,cr) — 
Tr (per). Linearity allows us to bring the expectation inside the trace so that the problem becomes 

maximize Tr(E p [p]cr) 

subject to Tr(tr) = 1, (6) 

a > 0 . 

The solution can be found in many textbooks covering linear programming—e.g. [17]. This solution also coincides 
with the one noted for a distribution supported on two states in [18]. 


B. General measures on mixed states 


For measures with support on mixed states, the situation is markedly different. Our main technical contribution 
are new upper bounds for this case. We obtain them by replacing the fidelity function—which is notoriously diffi¬ 
cult to grasp—in the main optimization problem (3) by quantities that are easier to handle in full generality. One 
rather straightforward approach to do so is to relate the fidelity function f(p, cr ) between arbitrary states p, cr 6 5 to 
corresponding Schatten-p-norm distances 

\\p-a\\ p = (Tr {\p-a\ v )) l/v , 

with 1 < p < oo and |X| = y/X*X for any X £ L (^J ■ This can be done by employing the well-known and often 
used Fuchs-van de Graaf inequalities [19] 

1 “ \/ F (i °w) < ^llp-0'lli < y [1 - F(p,a) Vp,<r <E S. 

This inequality together with the hierarchy of Schatten-p-norms assures 

F{p,cr) <1~\ I P~<r\\l < 1 ~l \\P-v\\1' (7) 

for any two quantum states p, cr € S. Replacing the objective function in the central optimization problem (3) by 
such an upper bound results in a different optimization which admits a general analytic solution. Clearly, such a 
relaxed optimum bounds the original figure of merit from above and allows us to establish our second main result. 

Theorem 2. For any finite dimension d and any distribution dp, the maximal average fidelity achieved by any estimator cr e S 
obeys 


max 

aeS 


E p [F{p,<r)\ < 1 - ^Tr (E p 



( 8 ) 


Note that the expression on the right hand side of (8) can be interpreted as a non-commutative generalization 
of the variance of a probability distribution. Flaving already outlined the main ideas necessary to establish such a 
result, we refer to Section IV B for a complete proof. 

Another way of establishing upper bounds on the average fidelity involves the concept of super-fidelity, which 
provides the following upper bound on the fidelity [20]: 


F{p,cr) < Tr (pa) + \Jl - Tr (p 2 )^! - Tr {a 2 ). 


(9) 


Although more involved, we shall see that such an approach yields strictly better bounds than the ones presented in 
Theorem 2. For brevity, we define p := E p [p] and p p := E p \J\ — Tr(p 2 j , such that inequality (9) assures 


maxE p [F (p,cr)] < max ^Tr (per) + p 9 \J 1 - Tr (cr 1 ) j , 


( 10 ) 
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for any distribution dp. Although more tractable than the original problem, the optimization on the right hand 
side still requires solving a non-commutative maximization over all quantum states <x £ S. However, applying a 
corollary of the famous Birkhoff-von Neumann theorem—see e.g. [21, Theorem 8.7.6]—allows for restricting this 
optimization to density operators a that commute with the distribution's mean p—see Lemma 1 below. Iff],..., jy 
denote the eigenvalues of p such a restriction assures that solving the right hand side of (10) is equivalent to 


maximize 


subject to 


A 


>-E ; 


E p ' s ‘ + 

;=i 
d 

Es f = i, 

i=i 

s,- > 0 1 < i < d, 


i =1 


( 11 ) 


which is a commutative convex optimization problem. We refer to Lemma 1 below for a detailed proof of this 
assertion. Note that, if the measure dp is supported exclusively on pure states, p p vanishes and (11) reduces to 
Theorem 1 which is tight. 

In order to obtain analytical bounds for mixed states, we further relax (11) by replacing the non-negativity con- 
straints (s, > 0) by the weaker demand that the optimization vector (si,..., s f /) & ITT is contained in the Euclidean 
unit ball—i.e. Yli=\ s? < 1. As we shall show in Section V, such a simplification is the tightest possible ellipsoidal 
relaxation of (11) and allows us to apply the method of Lagrangian multipliers in a straightforward fashion. Doing 
so results in the main theoretical statement of this paper. 

Theorem 3. For any finite dimension d and any distribution dp over states, the fidelity achieved by any estimator a G S is 
bounded from above by 


E p [F(p,<r)} < - 1 + v/d^l 


The matrix achieving this optimum corresponds to 


\ 


d E fl 


^1 - Tr (p2)l + Tr(E,[p] 2 ) 


- 1 


a* = -1 
d 


\ rf(pp+Tr(p 2 )) -1 d ^ 


( 12 ) 


(13) 


where lei (C rf j denotes the identity matrix. 

Again, we content ourselves here with outlining the proof architecture necessary to establish such a result and 
refer to Section IV for a detailed analysis. 

Note that since we relaxed the maximization constraints, cr» in general fails to be positive-semidefinite and is 
thus not a valid density operator, though we do not use it as such. In particular, the bound is not tight when dp 
is supported only on pure states—as might be evident from the possibility of non-positive states arising from the 
(p — 11) term in (13). On the other hand, the distribution is known and thus in the case of a distribution supported 
only on pure states, one should consult the exact solution in Theorem 1. 

Conversely, if cf* happens to be a state, it also solves the optimization (11) and the analytical bound (12) exactly 
reproduces an a priori tighter one. In all of our numerical experiments, some of which are presented below, this was 
indeed the case. 

It is also worthwhile to point out that super-fidelity—the bound in (9)— and the actual fidelity coincide for one 
qubit, i.e. for d — 2 [20]. Also replacing positive semidefiniteness by bounded purity yields the same feasible set for 
that particular case. Consequently the bound (12) reproduces one of the main results in [6]: 

Corollary 1. In the single-qubit case (i.e. d = 2) the bound (12) exactly reproduces the maximum average fidelity in [6, 
Equation (2.9)] and a* is the optimal estimator. 

Finally, we want to emphasize that establishing bounds on the average fidelity by using the super-fidelity instead 
of the Fuchs-van de Graaf inequalities leads to strictly better results: 

Corollary 2. Let the dimension d and the distribution dp over states be arbitrary. Then, the bound presented in Theorem 2 
(Fuchs van-de Graaf inequality) is either trivial — i.e. equal to one—or it strictly majorizes the one presented Theorem 3 (super¬ 
fidelity). 
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Number of Measurements 


FIG. 1. The average fidelity as a function of the number of single-shot measurements of the Haar uniform measurement. The 
prior distribution is here is also the Haar uniform measure on two qubits. The lines are the medians and shaded areas the 
interquartile ranges over 100 trials. 


III. NUMERICAL EXPERIMENTS 


Note that fidelity achieved by any estimator is a lower bound on the one achieved by the optimal estimator. A 
particularly convenient and generally well motivated [18] estimator is the mean of the distribution p — lE^[pJ. Our 
findings underline that for distributions of states relevant to tomography, the mean is very near-optimal. In the 
context of tomography the mean is furthermore arguably the most convenient estimator, since every other quantity 
of interest requires its calculation anyway 

Finding an analytical expression for the posterior distribution is a very challenging problem, let alone performing 
the multidimensional integrals required for the calculation of the expectations above. Thus, we turn to numerics. 
In particular, we use the Sequential Monte Carlo (SMC) algorithm, which has been successfully applied to quantum 
statistical problems in the context of dynamical parameter estimation [22-24] and quantum state estimation [25-27]. 
Also, this algorithm is available as an open-source implementation in Python [28]. 

Employing SMC allows us to perform the Bayesian updating and averaging. A complete and detailed discussion 
of the algorithm appears in Ref. [23] and thus we will not repeat the details here, but we will sketch the idea. The 
algorithm starts with a set of quantum states {pj}" = 0 , the elements of which are called particles. Here, n — | { pj } | is 
the number of particles and controls the accuracy of the approximation. By approximating the prior distribution by 
a weighted sum of Dirac delta-functions, 

Pr(p) ~ E w i S (P ~ ( 14 ) 

;=1 

Bayes' rule then becomes 

iVj i—t Pr(data|(15) 


followed by a normalization step. The SMC algorithm is designed to approximate expectation values, such that 

E p[/(l°)] ~ i^WjfiPj), ( 16 ) 

M 

for any function /. In other words, the SMC algorithm allows us to efficiently compute the multidimensional in¬ 
tegrals with respect to the measure defined by the posterior probability distribution. We use this algorithm, as 
implemented by [28], to numerically compute averages arising in simulated tomography experiments. By doing so, 
we explore the efficacy of our claims for a variety of distributions relevant to practice and found natural in experi¬ 
mentation. 

Recall the sharp distinction between measures supported on pure states and those with full support. We use the 
fact that Theorem 1 provides us with the optimal estimator in the former case to lend support to the claim that 
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FIG. 2. These plots depict the average fidelity as a function of the number of single-shot measurements of the Haar uniform 
measurements. 

First column: The prior distribution is here is Hilbert-Schmidt measure on two and three qubit mixed quantum states. 

Second column: The prior distribution for the upper plot is the Arcsine distribution while for the lower plot the Bures distribu¬ 
tion was used—both are supported on two qubit mixed quantum states (again, see [30] for a review of distributions of density 
matrices). 

In all cases, the solid lines are the medians and shaded areas illustrate the interquartile ranges over 100 trials. 


the mean estimator is a good candidate for a computationally simple, yet still near-optimal, alternative to solving 
the optimization problem in general. In Figure 1, we present the results of numerical simulations on two qubits. 
Plotted is the average fidelity achieved by the optimal estimator (see Theorem 1) and the mean estimator Ep[p]. 
The average is taken with respect to a distribution that begins as the Haar invariant measure on pure states and is 
updated through simulated measurement data, where the measurement is the "uniform POVM" consisting of all 
pure states, distributed uniformly according to the Haar measure. For independent measurements—i.e. local, non- 
adaptive ones—this measurement is optimal [29, Theorem 3.1]. We see that the mean estimator's fidelity tracks the 
optimal fidelity quite well. 

In Figure 2, we plot the average fidelity of the mean estimator against our bound (12) for measures supported also 
on mixed quantum states. Again, we simulate measurement data to get an accurate sense of how well the average 
fidelity of the mean estimator performs with respect to our bound for distributions relevant to tomography. In this 
case, the prior distribution is either the Hilbert-Schmidt measure (left column), or the arcsine and Bures distributions 
[30] for two qubits (right column). In each case, many other natural distributions appear as we update our prior 
through Bayes' rule. We see again that the mean estimator is a "good" estimator in that it comes close to the bound 
on the optimal fidelity and is the easiest non-trivial average quantity to evaluate. 


IV. PROOFS 


In this section we provide detailed derivations and proofs of the statements presented in Section II. 
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A. A detailed proof of Theorem 2 


Recall that in Theorem 2 we have claimed that the bound 


max 

creS 


Ep [F(p,c 7)] < 1 - iTr (e, [p 2 ] -E[p] 


(17) 


is valid for any prior distribution dp. In order to derive such a statement, we start with inequality (7) 

F(p,cr) < 1 - ^\\p~cr\\l 

which is a direct combination of the Fuchs-van de Graaf inequalities and the norm inequality || • 11 2 < || ■ ||i- As such 
it is valid for any two states p,cr £ S which in turn assures that it remains valid upon taking expectations over dp on 
both sides: 


1„ r „ ,, 2 ' 


E P [F{p,<r)] <1- Z E p \\p-a 


(18) 


Moreover, we can optimize over <7 on both sides to obtain 


maxE„ [F(p,cr)l < 1-mmE„ lip — cr 

r L \l /J A ^rz-C r 111 


4 crGS 


(19) 


The minimum on the right-hand side can in fact be calculated analytically. To this end, we define the function 


f(<r):=B p ||p — £^||| 


= Tr E 


— 2Tr 


(E p [p]tr) +Tr (cr 2 ) . 


Note that f(a) is convex, because it corresponds to a weighted average of convex norm-functions ||cr — p|| 2 and its 
matrix-valued derivative corresponds to 


/ , ( cr ) — — 2Ep [p] +2cr. 


( 20 ) 


This derivative vanishes if and only if cr"' — E p [p] holds and convexity of f(a) implies that this critical state corre¬ 
sponds to the unique minimum. The corresponding function value amounts to 


/(V)=Tr(E p [p 2 ])-Tr(E,[p] 2 ) 
and reinserting this global minimum into (19) yields the desired bound (17). 


( 21 ) 


B. A detailed derivation of Theorem 3 


Our main theoretical statement— Theorem 3 —follows from a three step procedure which was already briefly out¬ 
lined in Section II. 

The first step invokes the concept of super-fidelity [20] which assures 

maxEp [F ( p,a )] < max ^Tr (pa) + p p \Jl- Tr (c 2 )^ , 


with p — Ep [p] and p p — ~E p y 1 — tr (p 2 ) for any distribution dp. As it turns out, the optimization on the right 

hand side of this equation is much more tractable than the original problem on the left hand side. This is manifested 
by the following technical statement which is a direct consequence of the celebrated Birkhoff-von Neumann theorem. 

Lemma 1 . Fix any p p > 0 and suppose that p G S is an arbitrary density operator with eigenvalue decomposition p — 
Hi=i?i\bi)(bi\- Then the optimization 


maximize Tr (per) +p P Jl -Tr(<7 2 ), 
aeL(C d ) V 

subject to a > 0, Tr(cr) = 1. 


(22) 
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is equivalent to solving 


maximize 

Si,...,s^gR 


subject to 


d 


E f i s i + PP 


i=1 


\ 


i-E£ 

t=l 


E s i = !/ 

i=l 


s,- > 0 1 <i<d. 


(23) 


Moreover, there is a one-to-one correspondence between any feasible array (si,... ,sf) of this problem and the density operator 

r = LLi Si\bi)(bi\ , 

Proof. At the heart of this statement is an immediate corollary of the Birkhoff-von Neumann Theorem—see e.g. [21, 
Theorem 8.7.6]. For d x d Hermitian matrices p, cr this corollary assures 


A 

Tr (per) < Y_ r i s u (24) 

i'=i 

where r,- and s, denote the eigenvalues of p and cr, respectively, arranged in non-increasing order. If p has eigenvalue 
decomposition p — Ti=\ f I bi)(b, \, the right hand side of (24) corresponds to Tr (pa) where a = X]f =1 s, | bi)(bi \. Clearly, 
if cr G S was a quantum state to begin with, so is cr, because the spectra of cr and a coincide. Moreover, such a 
definition assures that both states have equal purity, i.e. Tr(V 2 ) = Tr(<7 2 ). Consequently, for any feasible point cr of 
the optimization (22), there is a a of the above form which admits a larger value in the optimization. Inserting the 
particular form of cr into this program results in (23). □ 

In order to arrive at the bound presented in Theorem 3, we employ one more relaxation which is going to allow us 
to solve the resulting problem analytically in full generality. To be concrete, we replace the non-negativity constraints 
(Sj > 0) in (23) by the weaker demand that the optimization vector (si,... ,Sj) T £ IR‘ / is contained in the Euclidean 
unit ball—i.e. s j < 1. Note that we explore the geometric properties of such a relaxation in Section V. In a 
nutshell it corresponds to the tightest possible elliptical relaxation of the feasible set in (22). By doing so, we arrive 
at the problem 


maximize 

Si,...,SrfGR 


d d 

E f ' S ! + Pp. 1 - E S l 

i=1 \ i=1 


(25) 


d d 

subject to E s i — 1/ E s ? — 1' 

i=1 i=1 

which can be solved analytically via the method of Lagrangian multipliers: 

Lemma 2. Let r\,...,r& denote the eigenvalues of any density operator and fix p p > 0. Then the problem (25) has a unique 
solution. The optimal value corresponds to 


1 

d 



and the array (sj,... ,sjj) achieving this optimum corresponds to the particular matrix 


cr* — -1 
d 


d- 1 

\ d (p^+Tr(p 2 )) -1 


P-'d 1 


(26) 


Note that this result together with the relaxations outlined in this section immediately implies Theorem 3 upon 
inserting the definitions of p p and p. The assumption p p > 0 is furthermore non-critical, because, by definition, 
p p — 0 if and only if dp is supported exclusively on pure states. This particular case, however, is already fully 
covered by Theorem 1 . 
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Proof of Lemma 2. Throughout this proof we shall represent the eigenvalues of the density operator p as a vector 
r — (?i,..., iy) T G K' ? . Likewise we shall encompass the scalar optimization variables s, in the vector s G ]R lf . 
Furthermore, let 0 = (0,..., 0) T and 1 = (1,..., 1) T denote the "all-zeros" and "all-ones" vectors on IR^, respectively. 
For x, y G IR rf , we will also make use of the standard inner product (x, y) — Yf! = ] x i l /i and the vectorial inequality 
x > y shall indicate component-wise inequality, i.e. x, > yi for all 1 < i < d. 

In such a vectorial form, the optimization problem (23) corresponds to 

maximize / (s) — (r, s) + p p y 1 - (s,s), 

subject to g (s) — (1, s) = 0. (27) 

(s,s) < 1. 


Note that (27) is a convex optimization problem, as it requires maximizing a concave function over a convex set. 
As such, it has a unique maximum. One way of finding this maximum is to apply standard techniques such as the 
Karush-Kuhn-Tucker (KKT) multiplier method [17] which are designed to take into account the inequality constraint 
(28). 

However, here we opt for a less direct but considerably more convenient and less cumbersome approach: we 
ignore the inequality constraint in (27) for now and employ the standard technique of Lagrangian multipliers (for 
equality constraints) in order to find the unique critical point s : of the optimization. In a second step, we are going to 
verify that this vector strictly obeys the additional inequality constraint, we have ignored so far, i.e. (s^, s : ) <1. This 
in turn implies that said inequality constraint is not active at the critical point which in retrospect confirms that we 
were in fact right to ignore it in the first place. Finally, the fact that we face a convex optimization problem assures 
that this unique critical point indeed yields the sought for global maximum of (27). 

In order to find the critical point s : in question we define the Lagrangian function 

L(s) =f(s)+Ag(s), (28) 


where we have—as already announced—ignored the inequality constraint (s, s) < 1. As a consequence, A G 1R 
denotes the single Lagrangian multiplier associated with the remaining normalization constraint. The necessary 
condition for an optimal solution of (27) then reads 


f — 


__PpS__ 

v/1 - (s,s) 


+ A1 = 0. 


(29) 


Taking the inner product of this vector-identity with the "all-ones" vector 1 results in 


0 - ( 1 , 0 ) 




Pp&s) 

V 1 - ( s ' s ) 


A<1,1> =1- 


Pp 

V 1 - ( s ' s ) 


+ d\, 


(30) 


where we have used (1, r) — D” =1 f, = Tr (p) — 1 and the normalization constraint, which likewise assures (1, s) = 1. 
This equation allows us to replace \J^~ (s, s) by and reinserting this into (29) results in the equivalent vector 
equation 

r — (1 + dX) s + A1 = 0. (31) 


This can be readily inverted to yield 

s =iT5A< f + A1 >- < 32 > 

In order to determine the value of A, we revisit (30) which in combination with (32) demands 

p 2 p = (1 +dA) 2 (1 - (s,s)) = (1 + dA) 2 - (r,r) -2A(1 ,f) - A 2 (1,1), (33) 

= d{d — 1)A 2 + 2(d — 1)A + 1 — Tr ^p 2 ^ , (34) 

where we have once more used (1, r) — 1 as well as (f, f) — ]QLi ? 2 = Tr (p 2 ). This results in the quadratic equation 

Al + -p-WXY){d’ + Tr f)-')' ‘ 35 » 
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for A whose two possible solutions correspond to 


A± 


1 

d 


t 

it 

V 


d (p2 + Tr(p 2 )) -1 

7^1 


(36) 


Note that the argument of the square-root is non-negative, because the purity Tr (p 2 ) of any quantum state is lower- 
bounded by 1/d. Also, the second solution A_ is vacuous, since it leads to an immediate contradiction. Indeed, it 
follows by inspection that A_ < —1/d holds. Together with (30) this implies the contradictory relation 


1 - (s,s) = 


Pp 

f dA_ 


< 0 , 


(37) 


because p p is positive by assumption. 

Consequently we are left with one meaningful value A+ for the Lagrangian multiplier and inserting it into (32) 
yields the unique critical solution 


s# = TLT 


d-1 


\ d (p^TTr(p 2 )) -1 




(38) 


Recall that throughout this proof we are exploiting a one-to-one correspondence between vectors s — (si,..., s n ) T G 
IR 1 ' 1 and hermitian d x d-matrices cr = Yli=\ s i\bi)(bi\ that commute with p. Consequently, the critical vector s : corre¬ 
sponds to the critical matrix presented in (26). 

Plugging the critical point s' into the objective function/(s) furthermore yields the corresponding critical function 
value 


/ (s f ) = (?,s s ) T pp^Jl — (stt,s#) = 


(r,r) T A+ (1, f) 
1 + d\+ 


p 2 _ d(p 2 + Tr(p 2 )) -l + l + dA + 


l+d\+ 


d(l TdA H 


1 + = l (’ + (d +OT) 


-1 


where we have once more replaced \J1 — (s^,s?|_) by ^ and combined that with the fact that (1 T dA+) = 

holds. 

With such a unique critical point s' at hand, we are now ready to show that it strictly obeys the inequality constraint 
(s^, s^) we have ignored so far. By employing the same equalities we have used in the previous paragraph, we can 
readily establish such a claim: 


<s#,s“> = 1 - (1 - <S#, s#) = 1 - (1 + P J a+)2 < 1. (39) 

The strict inequality on the right follows from the fact that p p > 0 holds by assumption. This indeed establishes, 
that s : is also a critical point of the optimization problem (27). Since this optimization corresponds to maximizing a 
concave function over a convex set, the unique critical point s : must correspond to the unique maximum of (27). □ 


C. Detailed proofs of Corollary 1 and Corollary 2 

We conclude the proof section with providing detailed proofs of the remaining statements, namely that Theorem 3 
reproduces the main result in [6] for the particular case of a single qubit, i.e. d — 2 (Corollary 1) and that the bounds 
presented in Theorem 3 are strictly better than the ones outlined in Theorem 2 (Corollary 2). 






















11 


Proof of Corollary 1. We start this section by pointing out that in the particular case of dimension d — 2, the 
two relaxations we have employed in the previous subsection are not relaxations at all. Indeed, for dimension 

two, fidelity and super-fidelity coincide, and moreover the sets |(i/i,i/ 2 ) T G IR 2 : y\ + 3/2 = 1, J/ 1 , 1/2 h Oj and 

{(i/I, 3 / 2 ) G IR 2 : J/i + J /2 = 1/ y\ +J /2 < 1} coincide (this one-to-one correspondence is illustrated in Figure 3 be¬ 
low). These low-dimensional equivalences assure that all the relaxations employed in the derivation of Theorem 3 
are actually tight. Consequently, in this particular low-dimensional case, we solve the actual problem of interest. 

For deducing the claimed statement from this fact, we consider Equation (2.9) in [ 6 ]: 

l + IMlzJ- ( 4 °) 

Flere x simply means the the data generated via the measurement. The vector V : , is defined as follows: 

V x = E p [r_Pr(x\p)}, (41) 

where r is related to the usual Bloch vector r — (x,y,z) via 



r = 



(42) 


We point out that this F is not the same average fidelity we have considered but the following quantity (which 
corresponds to our Eq. (4) above): 


F — max IF, 


E X | P [ F (p^(x)} 


Note however that, by employing Bayes' rule, this is equal to 


F = maxE, 




(43) 


(44) 


and thus maximizing the posterior average fidelity is equivalent to maximizing the total average fidelity. Our bound 
applies directly to the former but trivially extends to the latter. 

Thus, to establish Corollary 1, we need to extract the posterior average fidelity from the expressions above. First, 
using Bayes' rule, we calculate 

V^Pr^E^r], (45) 

Using the fact that ||rH 2 — 2Tr(p 2 ) — 1 and 

Tr(E, u [p] 2 ) = i(l + ||E H;c [r]Q, 

we find 

IIVJ! - Pr(x ) 2 ^2E p|;c ^1 -Tr(p 2 ) “ + 2Tr (e p ^[pf 
Plugging this back into (40), we have 


-1 . 


(46) 


(47) 


F =l[ 1 + LMX)J 2E P\ X [v/l-Tr(p 2 )l +2Tr(E pk [p] 2 ) -l] , 


2 

— 2 I 1 + 


= E, 


1 - 

m 

X 


2 

+ 2Tr(E, k [p] 2 ) -1 

\/l — Tr(p 2 ) 



2 , x 

\J 2E p\x 

Vl-Tr(p 2 ) 

+ nv(E plx [p]^ -1 


(48) 

(49) 


(50) 
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Thus, implied by the results of [ 6 ], the maximum posterior average fidelity (dropping the x for parallelism) is 


max 

cr 


V p [F(p,a)} = - 1 + 



/ 

r 

( 1+ \ 

2 ( E ' 

V 1 - Tr (p 2 ) 


- 1 


This coincides with our main result (12) for dimension d — 2. 

Proof of Corollary 2. For notational simplicity, let us introduce the short-hand notation 

(E p [p 2 ])-Tr(]E p [p] 2 ), 


s p := Tr 


(51) 

a 

(52) 


such that the bound presented in Theorem 2 simply reads max^^ E^ [F(p,cr)] < 1 — Note furthermore that 
0 < Sp < 1 holds. As already mentioned, the lower bound follows from invoking Jensen's inequality, while the 
upper bound is a simple consequence of the fact that the purity of any state is at most one. A vanishing s p would 
correspond to a trivial Fuchs-van de Graaf bound of one which is the first case instance covered by Corollary 2. 
Therefore we can from now on safely assume that s p > 0 holds. Under this assumption we prove the second claim 
by starting with the bound presented in Theorem 3 and upper-bounding it via a chain of inequalities which will 
ultimately lead to the bound presented in Theorem 2. Indeed, pick any dimension d and an arbitrary distribution dp 
over states. Then Jensen's inequality assures 


E 


P 


sjl -Tr(p 2 ) 


< 1-E p 



(53) 


and the right hand side of expression (12) in Theorem 3 can be upper-bounded by 

\ + yjd- 1 - dsp, (54) 

because the square root function is monotonically-increasing on the positive reals. Adding and subtracting s p in the 
last square root and once more invoking monotonicity allows us to continue via 


where we have used s p > 0 in the last line to obtain strict inequality. Since the square root is a concave function, the 
inequality 1 — s p < 1 — ^s p is valid for any s p < 1 and consequently 


1 

d 


d — 1 
d 


y/l ~ Sp < 1 - 


d — 1 
~ix r Sp ' 


(56) 


is true. Finally, we use the simple fact that | holds for any d > 2 to arrive at 1 — |s p which is just the Fuchs- 

van de Graaf bound. Since a strict inequality sign connects the expressions in (55), the claimed strict majorization 
follows. D 


V. GEOMETRIC INTERPRETATION OF THE RELAXATION LEADING TO Equation 25 

Recall that in order to arrive at Theorem 3, we have replaced the feasible set 

A rf_1 = {seR' 1 : (l,s) = 1, s > o}, (57) 

of the optimization problem (11) by 

S A i-i = {s£ U d : (l,s) = 1, (s,s) < l} , (58) 

which is a convex outer approximation of “ 1 . This follows from the basic fact that x 2 < x holds for any x e [o,i]. 
Since the vector components s, of any s G A d 1 have to obey s, £ [0,1], we can readily conclude 

(s,s) - X]s 2 < = 1. 

i=1 i=l 


(59) 
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Note that the converse is true if and only if d — 1,2—a fact which we have exploited in proving Corollary 1. 

Geometrically, the former set corresponds to the standard simplex in IR 1 '. In this section we prove that the latter 
one is in fact the minimum volume covering ellipsoid of the standard simplex which furthermore corresponds to a 
(d — 1)-dimensional Euclidean ball. For dimensions two and three this situation is illustrated in Figure 3. 




and its outer approximation S A d- 1 : Geometrically, the latter set corre¬ 
sponds to the minimum volume outer ellipsoid of the standard simplex. The figure illustrates this relation for dimensions d = 2 
and d = 3. Note that for d = 2, the two sets coincide. 


Proposition 1 (Geometric nature of £ A d-i )• The convex outer-approximation £ A d-\ of the d-simplex corresponds toa(d — 1 )- 
dimensional Euclidean ball with radhis \j~^~ and center -^1 which is contained in the (d — \)-dimensional hyperplane 
TLi,i ■— js G R d : (l,s) = l} . 


Proof. By definition, the set £ A d-\ corresponds to the intersection of the Euclidean unit ball B\ (0) = js G ]R rf : (s, s) < 1 

and the hyperplane TL \ / \. This assures £ a j i C TL\p by construction. 

One way to establish that £ A d -i is furthermore itself an Euclidean ball, is using "generalized cylindrical coordi¬ 
nates" for the Euclidean unit ball B d (0, 1): Such coordinates use the fact that B d (0, 1) is equivalent to the union of 
a family of (d — 1) -dimensional unit balls. More concretely: let z E IR" be an arbitrary unit vector and let f E IR 
denote a parameter. For each value of this parameter, we define the hyperplane TL z g — |s E lR rf : (z, s) = £ j which 
in particular contains the vector fz by construction. Furthermore, let B d 1 (z, f) C T,i z g be the (d — 1) -dimensional 

Euclidean ball with radius \Jl — ff and center fz that is contained in the hyperplane 'H -y . Clearly each element in 
such a union of sets is contained in the d-ball, and letting f range from —1 to 1 covers the entire d-ball. In order to see 
this, decompose any s € B d ( 0,1) as s = (s, z)z + z L such that {z L ,z) — 0 and set f = (s,z). Pythagoras' theorem 
then assures Hz^l^ < \J 1 and consequently s E B d ~ 1 (z,^). 

The structure of the particular problem at hand suggests to fix z = ^Ul. Indeed, such a particular choice of z 

assures equality of the hyperplane TL\ t \ which contains £ A d i and the hyperplane 'H i 1 j_, we have just introduced. 

\fd. ' y/d 

Consequently, the "cylindrical representation" of the Euclidean unit ball assures that the intersection £ A d -i = B\ (0) n 

TL\p corresponds to the (d — l)-ball B d ~ 1 (-j= 1, -4=) associated with the hyperplane H j_ 1 j_ and a parameter value 

\a a fd'fi 

f By definition, this ball has center ^1 and radius yjl — If — \J which completes the proof. □ 


The next statement establishes that our choice of replacing the original feasible set A‘ ?_1 in the proof of Theorem 3 
by the larger convex set £ A d -1 is in a precise sense the tightest possible elliptic relaxation of the original optimization 
problem. 

Proposition 2. The set S A d-i is the unique minimal volume covering ellipsoid of the standard simplex A d-1 . 

The proof exploits the following standard result about Lowner-John ellipsoids that is originally due to John. How¬ 
ever, here we make use of a slightly more general version presented in [31]. 
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Theorem 4 (Theorem 2.1 in [31]). Let K C IR‘ f be a convex body and let K be contained in the Euclidean unit ball B d ( 0). 
Then the following statements are equivalent: 

2. B d { 0) is the unique minimum volume ellipsoid containing K. 

2. There exist contact points u\,..., u m lying both in the boundary of K and B d (0), and positive numbers Ay..., A m , 
m > d, such that 


m m 

X> Mi =0 and A/|m/)(m/| — 1. (60) 

i =i ;=l 


Proof. In Proposition 1 we have established that the set f A d-i corresponds to a (d — l)-ball with radius 
center jl that (like the standard simplex) is contained in the hyperplane TL\p. A quick calculation reveals that all 
vertices of the standard simplex A‘ ?_1 —which are just the standard basis vectors e\,, e c \—have Euclidean distance 
to the ball's center. Consequently they are contained in the boundary of the ball £ A d i and we have found suf¬ 
ficiently many contact points for applying Theorem 4. Since volume is translationally invariant we can furthermore 
shift the coordinate's origin into the point ^1 (which is the center of the ball £, v ; i). This has the advantage that the 
affine space TL\q containing both A 1 ^ 1 and £ a j i turns into Ttyo which is a linear subspace. Note that with respect 
to the (translated) standard basis, the orthogonal projection onto this subspace is given by 


P=l--\1)(1\. 


With respect to this new coordinate system, the d contact points (vertices of the simplex) amount to e, — e, — ^1. 
Choosing unit weights A, = 1 for all m — d contact points m, — e, and calculating 


X] A iUi 

i =1 


i=l 






(61) 


reveals that the first condition for Theorem 4 is fulfilled. A similar calculation reveals 

m i 

EAil«f)(«,-| = 1 - 

i =1 a 


This, however equals just the projector P onto the subspace Ttyo which contains the entire (d — 1 [-dimensional 
problem of interest. Restricted to its range, a projector corresponds to the identity which establishes the second 
condition for Theorem 4. Since this statement is invariant under re-scaling, we can also apply it here, where the 


radius of the (d — 1)-dimensional surrounding Euclidean ball is not one but 


d—l 

d ■ 


□ 


VI. CONCLUSION 

In this work we have derived upper bounds on the average fidelity of any estimator with no restrictions on the 
dimension or the distribution being averaged over. Furthermore, we have shown a sharp distinction in the opti¬ 
mization problems of maximizing average fidelity between measures supported only on pure states and those with 
full support. In the former case, we have provided the exact optimal estimator, while in both cases we argued based 
on numerical evidence that the mean estimator is a good proxy for the optimal solution. 

Interestingly, we found that the analytical bound (12) (which is based on super-fidelity [20]) is strictly tighter than 
a corresponding one obtained using the well known, and often used, Fuchs-van de Graaf inequalities [19]. 

These results have obvious applications to practical Bayesian quantum tomography [13], since the bound can be 
computed online —that is, it is only a property of the current distribution under consideration. But we also expect our 
bound to be of interest in other theoretical work on tomography, where a benchmark is needed to make statements 
about absolute average performance of some candidate protocol. 
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